27 research outputs found

    DataCell: Exploiting the Power of Relational Databases for Efficient Stream Processing

    Get PDF
    Designed for complex event processing, DataCell is a research prototype database system in the area of sensor stream systems. Under development at CWI, it belongs to the MonetDB database system family. CWI researchers innovatively built a stream engine directly on top of a database kernel, thus exploiting and merging technologies from the stream world and the rich area of database literature. The results are very promising

    Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks

    Get PDF
    We study the problem of evaluating conjunctive queries com- posed of triple patterns over RDF data stored in distributed hash tables. Our goal is to develop algorithms that scale to large amounts of RDF data, distribute the query processing load evenly and incur little network traffic. We present and evaluate two novel query processing algorithms with these possibly conflicting goals in mind. We discuss the various tradeoffs that occur in our setting through a detailed experimental eval- uation of the proposed algorithms

    Exploiting the Power of Relational Databases for Efficient Stream Processing

    Get PDF
    Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after xx tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach

    MonetDB/DataCell: Online Analytics in a Streaming Column-Store

    Get PDF
    In DataCell, we design streaming functionalities in a mod- ern relational database kernel which targets big data analyt- ics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages for mod- ern applications in need for online analytics such as web logs, network monitoring and scientific data management. The major challenge then becomes the efficient support for specialized stream features, e.g., multi-query processing and incremental window-based processing as well as exploiting standard DBMS functionalities in a streaming environment such as indexing. In this demo, we present the DataCell system, an exten- sion of the MonetDB open-source column-store for online an- alytics. The demo gives the user the opportunity to experi- ence the features of DataCell such as processing both stream and persistent data and performing window based process- ing. The demo provides a visual interface to monitor the critical system components, e.g., how query plans transform from typical DBMS query plans to online query plans, how data flows through the query plans as the streams evolve, how DataCell maintains intermediate results in columnar form to avoid repeated evaluation of the same stream por- tions, etc. The demo also provides the ability to interac- tively set the test scenarios regarding input data and various DataCell knobs

    Enhanced Stream Processing in a DBMS Kernel

    Get PDF
    Continuous query processing has emerged as a promising query processing paradigm with numerous applications. A recent development is the need to handle both streaming queries and typical one-time queries in the same application. For example, data warehousing can greatly benefit from the integration of stream semantics, i.e., online analysis of incoming data and combination with existing data. This is especially useful to provide low latency in data-intensive analysis in big data warehouses that are augmented with new data on a daily basis. However, state-of-the-art database technology cannot handle streams efficiently due to their "continuous" nature. At the same time, state-of-the-art stream technology is purely focused on stream applications. The research efforts are mostly geared towards the creation of specialized stream management systems built with a different philosophy than a DBMS. The drawback of this approach is the limited opportunities to exploit successful past data processing technology, e.g., query optimization techniques. For this new problem we need to combine the best of both worlds. Here we take a completely different route by designing a stream engine on top of an existing relational database kernel. This includes reuse of both its storage/execution engine and its optimizer infrastructure. The major challenge then becomes the efficient support for specialized stream features. This paper focuses on incremental window-based processing, arguably the most crucial stream-specific requirement. In order to maintain and reuse the generic storage and execution model of the DBMS, we elevate the problem at the query plan level. Proper op

    Benchmarking RDF Storage Engines

    Get PDF
    In this deliverable, we present version V1.0 of SRBench, the first benchmark for Streaming RDF engines, designed in the context of Task 1.4 of PlanetData, completely based on real-world datasets. With the increasing problem of too much streaming data but not enough knowledge, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for the publishing, sharing, analysing and understanding of such data. Various approaches are emerging. To help researchers and users to compare streaming RDF engines in a standardised application scenario, we propose SRBench, with which one can assess the abilities of a streaming RDF engine to cope with a broad range of use cases typically encountered in real-world scenarios. We offer a set of queries that cover the major aspects of streaming RDF engines, ranging from simple pattern matching queries to queries with complex reasoning tasks. To give a first baseline and illustrate the state of the art, we show results obtained from implementing SRBench using the SPARQLStream query-processing engine developed by UPM

    Continuous RDF Query Processing over DHTs

    No full text
    We study the continuous evaluation of conjunctive triple pattern queries over RDF data stored in distributed hash tables. In a continuous query scenario network nodes subscribe with long-standing queries and receive answers whenever RDF triples satisfying their queries are published. We present two novel query processing algorithms for this scenario and analyze their properties formally. Our performance goal is to have algorithms that scale to large amounts of RDF data, distribute the storage and query processing load evenly and incur as little network traffic as possible. We discuss the various performance tradeoffs that occur through a detailed experimental evaluation of the proposed algorithms

    A Query Language for a Data Refinery Cell

    No full text
    In this paper we propose the DataCell, an event management system designed as a flexible data hub in an ambient environment and we mainly focus on its language interface. The DataCell provides an orthogonal extension to SQL'03, called "basket expressions", which behave as predicate windows over multiple streams and
    corecore